Search CORE

3 research outputs found

Conceptual data sampling for image segmentation- an application for breast cancer images

Author: Awan Zainab Khalid
Publication venue
Publication date: 01/06/2017
Field of study

At the present time data analytics have become a buzzword for the in- formation technology sector. In an attempt to analyze data; one may follow various paths. Be it deploying sophisticated technologies to process big data or using commodity hardware while applying data reduction/sampling techniques to draw meaningful insights from a data. In this thesis, we aim to reduce data size in terms of th e number of tuples/objects for a given data. Our method has driven its roots from formal concept analysis (FCA); which is a mathemat- ical framework for data analysis. The proposed transformation is preserving functional dependencies/implications in a database. Consequently, we can gen- erate a much smaller data sample that is able to help in making decisions. In this study, we analyze a variety of reduction methods in order to recognize the best one(s), including randomized object selection procedures. The accu- racy of the decision s made on generated sample is comparable to accuracy of the decision made of whole/original data. To illustrate the concept we have chosen data from medical image domain. The data used for experimentation contains microscopic images of breast cancer that need to be segmented into two categories; i.e. benign or malignant. Extensive set of experiments have been performed to show the strength of the proposed reduction method

Qatar University Institutional Repository

Biomedical Information Extraction with Deep Neural Models

Author: Awan Zainab Khalid
Publication venue
Publication date: 01/01/2021
Field of study

University of Technology Sydney. Faculty of Engineering and Information Technology.Biomedical literature contains a wealth of knowledge in the form of unstructured articles and patents. Scientists find it hard to keep up to date with the literature being published. To further research and avoid repetition, published literature must be reviewed. Structured knowledge bases allow easy access to knowledge by avoiding manual screening of a text document. Knowledge base construction requires curation of literature either manually or automatically. Manual curation of published literature for acquiring knowledge is tedious and time-consuming. Furthermore, manual curation cannot keep up with rapidly growing literature, which calls for research in developing tools to automatically extract information from research articles. This thesis aims to identify entities and relations specific to the ChEBI ontology in publication abstracts. It includes identifying species, metabolites, proteins and chemicals and their relations, namely, `Metabolite of', `Associated With', `Isolated From' and `Binds With'. Current approaches for biomedical information extraction rely on syntactic rules, dictionary matching or domain-specific features, resultling in highly specialised and often non-generalisable. Approaches. Deep learning methods, on the other hand, are capable of feature extraction. This thesis proposes deep learning methods for named entity recognition/normalisation and relation extraction. A knowledge graph has been constructed for storing and querying the extracted knowledge. This thesis makes three contributions to knowledge: Deep Contextualized Neural Embeddings for ChemNER, Bi-Encoders based learning to rank for entity normalisation and Pre-trained transformers for ChEBI relation extraction. Contribution 1 proposes and evaluates improved word representations for named entity recognition using the Bi-LSTM-CRF network by including embeddings from language models in its input representations. The proposed method is evaluated on two abstract and two patent corpora and established state-of-the-art results on the abstract corpora. Contribution 2 develops and evaluates a transformer-based ranking method based on the BERT architecture for the named entity normalisation task for linking species to the NCBI taxonomy. Note that species to NCBI taxonomy identifiers are linked by generating candidates using the information retrieval algorithm BM25 and then re-ranking based on encoder representations from transformers. The proposed method has been evaluated on S800 and LINNAEUS corpora and outperforms existing methods for species normalisation. Contribution 3 proposed and evaluated transformer-based models for ChEBI relation extraction. Finetuning and task-specific feature extraction approaches are proposed, and both are compared. Empirical evidence suggests that finetuning is better when the target data is small

OPUS - University of Technology Sydney